Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1129 improve api for subphases #1805

Draft
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

nlslatt
Copy link
Collaborator

@nlslatt nlslatt commented May 17, 2022

The PR adds tracking of subphases to the PhaseManager and removes it from ElementLBData. Similarly, it moves tracking of phases from PhaseManager to ElementLBData, although that is not an essential part of this PR and could be reversed.

This PR adds CollectionChainSet::nextStepCollectiveSubphase() (could be renamed) and runSubphaseCollective to the API for advancing the subphase. The API for explicitly setting the subphase on each collection element has been removed.

A subphase ends at the termination of the epoch created by nextStepCollectiveSubphase or runSubphaseCollective and may contain work that preceded the call and did not fall explicitly within an earlier subphase. If there is no work following termination of an epoch that defines a subphase, there will be an empty subphase following it.

Closes #1129

@nlslatt
Copy link
Collaborator Author

nlslatt commented May 17, 2022

The current implementation appears to have a race condition. Does the action that gets triggered at the end of an epoch immediately follow termination or happen at some unknown point after termination?

@github-actions
Copy link

PR tests (clang-5.0, ubuntu, mpich)

Build for ce6b977



The following tests FAILED:
  387 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  389 - vt:TestSubphaseManagement.test_subphase_collective_nested_with_non_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (gcc-5, ubuntu, mpich)

Build for ce6b977



The following tests FAILED:
  386 - vt:TestSubphaseManagement.test_subphase_collective_proc_2 (Failed)
  387 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  389 - vt:TestSubphaseManagement.test_subphase_collective_nested_with_non_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (clang-3.9, ubuntu, mpich)

Build for ce6b977



The following tests FAILED:
  386 - vt:TestSubphaseManagement.test_subphase_collective_proc_2 (Failed)
  387 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  388 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_2 (Failed)
  389 - vt:TestSubphaseManagement.test_subphase_collective_nested_with_non_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (gcc-6, ubuntu, mpich)

Build for ce6b977



The following tests FAILED:
  386 - vt:TestSubphaseManagement.test_subphase_collective_proc_2 (Failed)
  387 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  388 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_2 (Failed)
  389 - vt:TestSubphaseManagement.test_subphase_collective_nested_with_non_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (gcc-7, ubuntu, mpich, trace runtime, LB)

Build for ce6b977



The following tests FAILED:
  386 - vt:TestSubphaseManagement.test_subphase_collective_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (gcc-10, ubuntu, openmpi, no LB)

Build for ce6b977

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

PR tests (gcc-9, ubuntu, mpich, zoltan)

Build for ce6b977



The following tests FAILED:
  386 - vt:TestSubphaseManagement.test_subphase_collective_proc_2 (Failed)
  387 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  388 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (clang-9, ubuntu, mpich)

Build for ce6b977



The following tests FAILED:
  386 - vt:TestSubphaseManagement.test_subphase_collective_proc_2 (Failed)
  387 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  388 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (gcc-8, ubuntu, mpich, address sanitizer)

Build for ce6b977



The following tests FAILED:
  386 - vt:TestSubphaseManagement.test_subphase_collective_proc_2 (Failed)
  387 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  389 - vt:TestSubphaseManagement.test_subphase_collective_nested_with_non_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (clang-12, ubuntu, mpich)

Build for ce6b977

In file included from src/CMakeFiles/vt.dir/Unity/unity_14_cxx.cxx:7:
/vt/src/vt/utils/memory/memory_usage.cc:149:26: warning: 'mallinfo' is deprecated [-Wdeprecated-declarations]
    struct mallinfo mi = mallinfo();
                         ^
/usr/include/malloc.h:114:48: note: 'mallinfo' has been explicitly marked deprecated here
extern struct mallinfo mallinfo (void) __THROW __MALLOC_DEPRECATED;
                                               ^
/usr/include/malloc.h:32:30: note: expanded from macro '__MALLOC_DEPRECATED'
# define __MALLOC_DEPRECATED __attribute_deprecated__
                             ^
/usr/include/x86_64-linux-gnu/sys/cdefs.h:339:51: note: expanded from macro '__attribute_deprecated__'
# define __attribute_deprecated__ __attribute__ ((__deprecated__))
                                                  ^
1 warning generated.

The following tests FAILED:
  386 - vt:TestSubphaseManagement.test_subphase_collective_proc_2 (Failed)
  387 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  388 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_2 (Failed)
  389 - vt:TestSubphaseManagement.test_subphase_collective_nested_with_non_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (clang-13, ubuntu, mpich)

Build for ce6b977

In file included from src/CMakeFiles/vt.dir/Unity/unity_14_cxx.cxx:7:
/vt/src/vt/utils/memory/memory_usage.cc:149:26: warning: 'mallinfo' is deprecated [-Wdeprecated-declarations]
    struct mallinfo mi = mallinfo();
                         ^
/usr/include/malloc.h:114:48: note: 'mallinfo' has been explicitly marked deprecated here
extern struct mallinfo mallinfo (void) __THROW __MALLOC_DEPRECATED;
                                               ^
/usr/include/malloc.h:32:30: note: expanded from macro '__MALLOC_DEPRECATED'
# define __MALLOC_DEPRECATED __attribute_deprecated__
                             ^
/usr/include/x86_64-linux-gnu/sys/cdefs.h:339:51: note: expanded from macro '__attribute_deprecated__'
# define __attribute_deprecated__ __attribute__ ((__deprecated__))
                                                  ^
1 warning generated.

The following tests FAILED:
  386 - vt:TestSubphaseManagement.test_subphase_collective_proc_2 (Failed)
  387 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  388 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_2 (Failed)
  389 - vt:TestSubphaseManagement.test_subphase_collective_nested_with_non_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (clang-11, ubuntu, mpich)

Build for ce6b977

In file included from src/CMakeFiles/vt.dir/Unity/unity_14_cxx.cxx:7:
/vt/src/vt/utils/memory/memory_usage.cc:149:26: warning: 'mallinfo' is deprecated [-Wdeprecated-declarations]
    struct mallinfo mi = mallinfo();
                         ^
/usr/include/malloc.h:114:48: note: 'mallinfo' has been explicitly marked deprecated here
extern struct mallinfo mallinfo (void) __THROW __MALLOC_DEPRECATED;
                                               ^
/usr/include/malloc.h:32:30: note: expanded from macro '__MALLOC_DEPRECATED'
# define __MALLOC_DEPRECATED __attribute_deprecated__
                             ^
/usr/include/x86_64-linux-gnu/sys/cdefs.h:339:51: note: expanded from macro '__attribute_deprecated__'
# define __attribute_deprecated__ __attribute__ ((__deprecated__))
                                                  ^
1 warning generated.

The following tests FAILED:
  388 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (intel icpx, ubuntu, mpich)

Build for ce6b977



The following tests FAILED:
  306 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (clang-10, alpine, mpich)

Build for ce6b977



The following tests FAILED:
  386 - vt:TestSubphaseManagement.test_subphase_collective_proc_2 (Failed)
  387 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  388 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_2 (Failed)
  389 - vt:TestSubphaseManagement.test_subphase_collective_nested_with_non_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (nvidia cuda 10.1, ubuntu, mpich)

Build for ce6b977



The following tests FAILED:
  307 - vt:TestSubphaseManagement.test_subphase_collective_nested_with_non_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (nvidia cuda 11.0, ubuntu, mpich)

Build for ce6b977



The following tests FAILED:
  428 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  429 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_2 (Failed)
  440 - vt:TestSubphaseManagement.test_subphase_collective_proc_4 (Failed)
  441 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_4 (Failed)
  442 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_4 (Failed)
  443 - vt:TestSubphaseManagement.test_subphase_collective_nested_with_non_proc_4 (Failed)

Build log

@github-actions
Copy link

PR tests (clang-14, ubuntu, mpich)

Build for 9946d16



The following tests FAILED:
  386 - vt:TestSubphaseManagement.test_subphase_collective_proc_2 (Failed)
  387 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  388 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_2 (Failed)
  389 - vt:TestSubphaseManagement.test_subphase_collective_nested_with_non_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (gcc-11, ubuntu, mpich)

Build for 9946d16



The following tests FAILED:
  386 - vt:TestSubphaseManagement.test_subphase_collective_proc_2 (Failed)
  387 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  388 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_2 (Failed)
  389 - vt:TestSubphaseManagement.test_subphase_collective_nested_with_non_proc_2 (Failed)

Build log

@github-actions
Copy link

PR tests (clang-10, ubuntu, mpich)

Build for ce6b977



The following tests FAILED:
  563 - vt:TestSubphaseManagement.test_subphase_collective_proc_4 (Failed)
  564 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_4 (Failed)
  565 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_4 (Failed)
  566 - vt:TestSubphaseManagement.test_subphase_collective_nested_with_non_proc_4 (Failed)

Build log

@github-actions
Copy link

PR tests (gcc-12, ubuntu, mpich)

Build for 9946d16



The following tests FAILED:
  386 - vt:TestSubphaseManagement.test_subphase_collective_proc_2 (Failed)
  387 - vt:TestSubphaseManagement.test_subphase_collective_with_non_1_proc_2 (Failed)
  388 - vt:TestSubphaseManagement.test_subphase_collective_with_non_2_proc_2 (Failed)

Build log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve API for subphase shifting
1 participant